Deciphering Undersegmented Ancient Scripts Using Phonetic Prior
نویسندگان
چکیده
Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) closest known language is determined. We propose a model handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. capture natural phonological geometry learning character embeddings based International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed constraints. evaluate deciphered (Gothic, Ugaritic) an one (Iberian). experiments show incorporating phonetic leads to clear gains. Additionally, we measure for closeness which correctly identifies related Gothic Ugaritic. For Iberian, method does strong evidence supporting Basque as language, concurring with favored position current scholarship. 1
منابع مشابه
Deciphering ancient rapid radiations.
A deeper phylogenetic understanding of ancient patterns of diversification would contribute to solving many problems in evolutionary biology, yet many of these phylogenies remain poorly resolved. Ancient rapid radiations pose a major challenge for phylogenetic analysis for two main reasons. First, the pattern to be deciphered, the order of divergence among lineages, tends to be supported by sma...
متن کاملA Computational Approach To Deciphering Unknown Scripts
We propose and evaluate computational techniques for deciphering unknown scripts. We focus on the case in which an unfamiliar script encodes a known language. The decipherment of a brief document or inscription is driven by data about the spoken language. We consider which scripts are easy or hard to decipher, how much data is required, and whether the techniques are robust against language cha...
متن کاملA Computational Phonetic Model for Indian Language Scripts
In spite of South Asia being one of the richest areas in terms of linguistic diversity, South Asian languages have a lot in common. For example, most of the major Indian languages use scripts which are derived from the ancient Brahmi script, have more or less the same arrangement of alphabet, are highly phonetic in nature and are very well organised. We have used this fact to build a computatio...
متن کاملHuman-Robot Coordination Using Scripts
This paper describes an extension of scripts, which have been used to control sequences of robot behavior, to facilitate human-robot coordination. The script mechanism permits the human to both conduct expected, complementary activities with the robot and to intervene opportunistically taking direct control. Scripts address the six major issues associated with human-robot coordination. They all...
متن کاملUsing Scripts for Reactive Planning
We describe a model-based representation for real-time planning in technical processes. This means a complete model of causal and temporal dependencies is developed for an application. The main difference to other planners is that scripts, an event-oriented representation formalism is used and that goals and actions are part of the knowledge base in order to reason about old goals and already s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2021
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00354